0%

(ICLR 2015) Explaining and harnessing adversarial examples

Keyword [FGSM]

Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples[J]. arXiv preprint arXiv:1412.6572, 2014.



1. Overview


1.1. Motivation

  • ML and DL model misclassify adversarial examples. Early explaining focused on nonlinearity and overfitting
  • generic regularization strategies (dropout, pretraining, model averaging) do not confer a significant reduction of vulnerability to adversarial examples

In this paper

  • explain it by their linear nature
  • fast gradient sign method to generate adversarial examples
  • adversarial training can provide an additional regularization benefit beyond that provided by dropout
  • the same adversarial examples is often misclassified by a variety of classifiers with different architecture or trained on different subsets of the training data
  • trained on adversarial examples can regularize the model

1.3. Summary

  • adversarial examples can be explained as a property of high-dimensional dot products
  • the generation of adversarial examples across different models. different models learn similar function when trained to perform the same task
  • adversarial training can result in regularization, further than dropout
  • models optimized easily are easy to perturb
  • linear models lack the capacity to resist adversarial perturbation. only structures with hidden layer should bt trained to resist it
  • ensembles are not resistant to adversarial examples

1.4. Linear Explanation of Adversarial Examples

  • precision of an individual input feature is limited
    discard all information (η) below 1/255


  • classify x and x‘ to the same class when the elements of η is less than precision ε



  • consider the dot product between a weight vector w and an adversarial example x’



  • the adversarial perturbation (η) causes the activation to grow by



  • maximize the growth, when



  • assume w has n dimension, and the average magnitude of an element in w is m, the growth will be



  • grow linearly with n. when dimension n is large, it will make greatly effects

1.5. Linear Perturbation of Non-linear Models



1.5.1. Fast Gradient Sign Method



  • Θ. parameters of model
  • x. Input
  • y. output
  • J. cost function

1.6. Adversarial Training of Deep Network



  • deep network at least can represent function to resist adversarial perturbation, but shallow linear model can not

The adversarial training on maxout network does not work well. After make the model larger it works well.